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Abstract. In this article we study the problem of recovering the unknown solution of a 
linear ill-posed problem, via iterative regularization methods. We review the problem of 
projection-regularization from a statistical point of view. A basic purpose of the paper is the 
consideration of adaptive model selection for determining regularization parameters. This 
article introduces a new regularized estimator which has the best possible adaptive properties 
for a wide range of linear functionals. We derive non asymptotic upper bounds for the mean 
square error of the estimator and give the optimal convergence rates. 



1. Introduction 

The area of mathematical inverse problems is quite broad and involves the qualitative and 
quantitative analysis of a wide variety of physical models. Moreover, a considerable number 
of problems arising in different scientific and technical fields belong to a class of ill-posed 
problems. For example, geophysicists scan the earth's subsurface by recording arrival times 
of waves reflected off different layers underneath the surface, and try to determine a meaningful 
solution and to understand which features in the solution are statistically significant. 

From a statistical point of view, the problem can be seen as recovering an unobservable 
signal / based on observations 

(1.1) y( Xi ) = Af( Xi ) + Si, 

where A : F — > Y is some known compact linear operator defined over a separable Hilbert 
space F, with values in a separable Hilbert space Y and fixed observation 

scheme. We assume that the observations y(xi) G 1R and that the observation noise £j are 
i.i.d. realizations of a certain random variable e. Throughout the paper, we shall denote 
y = (y(xi))™ = i- In this article we study the problem of estimating / using fixed iterative 
methods. 

The best possible accuracy, regardless of any discretization and noise corruption is deter- 
mined by some a priori smoothness assumption on the exact solution /. Here, smoothness is 
given in terms of some index function r] on the spectrum de A* A by 

A VtP = {feF,f = r l (A*A)u,\\u\\<p} 
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where A VtP is called a source condition. For classical Hilbert scales, the smoothness is measured 
in terms of powers 77 (i) := with < /i < /i , yUo > 0. 

In a deterministic framework, the statistical model (jl.lj) is formulated as the problem of 
finding the best-approximate solution of 

Af = y, 

in the situation where only perturbed data y s are available with 

lb -/II <s. 

Here, S is called the noise level. It is important to remark that whereas in this case consistency 
of the estimators depends on the approximation parameter S, in (jl.lj) it depends on the number 
of observations n. 

In general, the best L 2 approximation A'y, where A* is the Moore-Penrose (generalized) 
inverse of A, does not depend continuously on the left-hand side y. We define the Moore- 
Penrose inverse in an operator-theoretic way by restricting the domain and range of A in such 
a way that the resulting restricted operator is invertible; its inverse will then be extended to 
its maximal domain V(A^) = K(A) + TZ(A) L , with 71(A) the range of the operator A and 
TZ(A) 1 - the orthogonal complement of the range of A. 

The inverse problems that we study in this article are called ill-posed problems because the 
operator A is compact and consequently equation (jl.lj) can not be inverted directly since A -1 
is not a bounded operator. Ill-posed problems are usually treated by applying some linear 
regularization procedure, often based on a singular value decomposition; see Tikhonov and 
Arsenin in j2S] . An interesting early survey of the statistical perspective on ill-posed problems 
is studied in great detail by O' Sullivan in |2T| . 

In practice however (jl.lj) is hardly ever considered. Instead, we project the problem onto 
a smaller dimensional space Y m of Y. This yields a sequence of closed subspaces Y m indexed 
by m 6 M n , a collection of index sets. Clearly, an important problem is thus how to choose 
subspace Y m based on the data. This can be done by selection of a cutoff point or by threshold 
methods. Choosing the right subspace will be called model selection. 

Sometimes this projection provides enough regularization to produce a good approximate 
solution, but often additional regularization is needed. Regularization methods replace an ill- 
posed problem by a family of well-posed problems, their solution, called regularized solutions, 
are used as approximations to the desired solution of the inverse problem. These methods 
always involve some parameter measuring the closeness of the regularized and the original 
(unregularized) inverse problem, rules (and algorithms) for the choice of these regularization 
parameters as well as convergence properties of the regularized solutions are central points 
in the theory of these methods, since they allow to find the right balance between stability 
and accuracy. The general principles of regularization for ill-posed problems are known. In 
particular, such principles have been established by A.N. Tikhonov. The literature on various 
regularization methods based on these general principles is extensive ( Engl, Hanke, and 
Neubauer 0, Gilyazov and Gol'dman [TlJ ). 
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In statistic, regularization, is associated to penalty based methods or thresholding methods 
or more generality to "smoothing" techniques. In applications, regularization offers a unifying 
perspective for many diverse ill-posed inverse problems, a wide range of problems concerned 
with recovering information from indirect and usually noisy measurements, arising in geo- 
physics, tomography and econometrics. One of the most important, but still insufficiently 
developed, topics in the theory of ill posed problems is connected with iteration regulariza- 
tion [TT|; i-e, with the utilization of iteration methods of any form for the stable approximate 
solution of ill-posed problems. Iterative regularization methods tend to be more attractive in 
terms of numerical cost and implementation, but a number of open questions remain in their 
theoretical analysis. 

In this article we propose an iterative regularized estimator for linear ill-posed problems. 
Necessary conditions for convergence are established. These conditions connect the choice of 
the regularization parameter (i.e., the iteration index) with the projection dimension. More- 
over, we prove that the iterative regularized estimator is optimal in the sense that the estima- 
tor achieves the best rate of convergence among all the regularized estimators. A recent work 
in this direction is developed by Loubes and Ludena in ^Hj, which discusses the problem of 
estimating inverse nonlinear ill-posed problems with different types of complexity penalties 
leading either to a model selection estimator or to a regularized estimator. 

The choice of the regularization sequence is here crucial, and a lot of work associated with 
the selection of a good regularization parameter can be found in the literature JUJj [T3j . 
When using iterative methods the problem is finding a good stopping criteria for terminating 
the iteration procedure. In this article we will use tools developed in the context of model 
selection via penalization, |2],(I], based on the use of concentration inequalities. 

Our article is organized as follows. Section 2 presents basic assumptions and a statement of 
the discretized inverse problem. In section 3 we discuss regularization methods and we prove 
consistency of the estimator when the regularization parameter is known. In section 4 we 
present our main result, prove optimality of an adaptive regularized estimator and give its rate 
of convergence. Finally, in the last section we introduce regularization by iterative methods 
for the solution of inverse problems and provide some examples to explain the properties of 
iterative regularization methods. 

2. Preliminaries 

2.1. Formulation of the problem and basic assumptions. 

We assume that the inverse problem is given by 

y = Af + e. 

where e is a centered random variable satisfying the moment condition !E(|e| p /cr p ) < p\/2 and 
E(e 2 ) = a 2 . 
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We also need some notations concerning the fixed design settings, x i: i = 1, . . . ,n. Define 
the empirical measure: 

1 n 

n 

t=i 

and the associated empirical norm 

1 n 

WvWl = \\y\\k = - ^2(y( x i)f 

n i=i 

as well as the empirical scalar product 

1 n 

<y,e > n = - y^EiyiU). 
i=i 

We assume A : F — > Y, F, Y separable Hilbert spaces. Let ( , ) stands for the inner product 
defined over F. We assume that the range of the operator A, 1Z(A), is closed in the sense that 
/ G Af(A)- 1 -, where M{A)- L is the orthogonal complement of the null space of the operator A. 

With this notation let J(f) = \\y — A/lln the quadratic risk function. We will denote by / 
the function that minimizes the risk (which may not be unique), defined as 

(2.1) / = argmin J(/), 

where the minimum is taken over all functions from F to Y. 

The solution of the problem mhij eF J(f) exists if and only if / is a solution of the normal 
equation 

(2.2) A*Af = A*y 

where A* : Y — > F is the adjoint operator of A (introduced via the requirement that for all 
/ e F and y G Y, (Ax,y) n = ±(x,A*y) holds). 

It is important to remark that the operator A* actually depends on the observation sequence 
Xi, % — 1, . . . n. If Y is generated by {<^-}jli and is such that this basis is orthonormal with 
respect to the L 2 (P n ) norm over Y, and A is the identity then A* = -(<Pj(xi))i,j- 

It is necessary to mention that the convergence rates can thus be given only over subsets 
of F, i.e., under a-priori assumptions on the exact solution /. We will formulate such a-priori 
assumptions, encountered typically in the inverse problem literature, in terms of the exact 
solutions by considering subsets of F given by some source condition of the form 

A^ p = {feFJ = (A*Arcu,\\cu\\<p} 
where < /x < /x , A*o > and use the notation 



(2.3) 



A,= \jA^ = n((A*Ar) 

p>0 
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These sets are usually called source sets, / G A^ p is said to have a source representation. 
The requirement that / be in A pp can be considered as an smoothness condition. 

2.2. Projection methods. 

For numerical calculations, we have to approximate the space F by a finite-dimensional 
subspace. Estimating over all F is in general not possible. One approach in this direction 
is regularization by projection, where the regularization is achieved by a finite-dimensional 
approximation through projection. 

Let M n be a collection of index sets (m G M n ,m = {ji . . . ,jd m })- We give a sequence 
Y± C ¥2 • • • C Y m . . . C Y whose of union is dense in Y . We assume 

dim(Y m ) = d m . 

Let Ily m be the orthogonal projector in the empirical norm over the subspace Y m and let 
A m = H Y A. Define F m = A*Y m , with A* m : Y m — ► F the adjoint operator of A m , and Tip m to 
be the orthogonal projector onto the subspace F m . Then, by construction 

K Fm = m m A)+Tl" Ym A. 

Thus, we shall assume that data are give through an orthogonal design, corresponding to 
an orthogonal projection H Y as 

(2-4) U Ym y = U Ym Af + U Ym e. 

With this notation we have that the best-approximate L 2 solution has the expression 

Il F m f = A ] m y m . 

for y m = U Y y in the domain of A^ m . In the following we shall denote f m = U.F m f- 

Our goal is to find the solution of the equation p. 1)1 in the finite-dimensional subspace F m 
of F. We have that for projection without regularization the choice of F m and of A m has many 
advantages. For noisy data and severely ill-posed problems the dimension of the subspace 
has to be rather low to keep total error estimate small, since for these problems the smallest 
singular value of A m decreases rapidly as m increases. To be able to use larger dimensions 
we have to combine the projection method with additional regularization methods, such as 
iterative methods P].[TT]. 



2.3. Singular value decomposition. 

As often A m is not of full rank, the singular value decomposition (SVD) of the operator A m 
is then a useful tool. Let (af, <pj, <pj)j em be a singular system for a linear operator A m , that is, 
Am4>j — a j i Pj an d A* m ifj = Cjipj] where {cr|}y 6m are the nonzero eigenvalues of the selfadjoint 
operator A* m A m (and also of A m /l*J, considered in decreasing order. Furthermore, {4>j}j^ m 
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and {(fj}j(z m are a corresponding complete orthonormal system of eigenvectors of A* m A m and 
A m A* m1 respectively. For general linear operators with an SVD decomposition, we can write 



(2-5) A m f = ^2a j (f,<l> j )(p j 

jem 

(2-6) A* m y m = °j (Vm, ¥j)<Pj- 



jem 

For y m in the domain of A^, V(Al n ), the best-approximate L 2 solution hast the expression 

Note that for large j, the term grows to infinity. Thus, the high frequency errors 
are strongly amplified. We will asume that Oj = 0(j~ p ) for some p > 1/2, which is clearly 
related to the ill-posedness of the operator A and the approximation properties of Y m . For the 
construction and analysis of regularization methods, we will require some general notation 
for functions of the operators A* m A m and A m A* m . 

Let E\ be the spectral decomposition of A* m A m given by 

Ex{-)= E 

cr 2 <\j,j£m 

and H\ the spectral decomposition of A m A* m . Then E\ is an orthogonal projector and projects 
onto 

span{0j | j G to, a 2 < A}. 
Since (a 2 ; <pj) is an eigensystem for the selfadjoint compact operator A* n A m , 

A* m A m f = Y,° 2 3 UA 3 )h 

holds, which will be written (using the definition of the integral below) as 

A* m A m f = J XdE x f 

for / G V(A m ). Here the limits of integration could be and ||A m || 2 + e for any e > 0. We 
sometimes omit the limits of integration. 

This, motivates the definition 
(2.7) G(A* m A m ):= jG(\)dE x := £ G(a 2 )(; ft) ft 

cr 2 =Xj,jem 

of a (piecewise) continuous function G of a selfadjoint linear operator on F m . If A* A is 
continuously invertible, then (A*A)~ l = j jdE X - 



ITERATIVE METHODS FOR LINEAR INVERSE PROBLEMS 



7 



In this case the best-approximate L 2 solution, for y m in the domain of A^, can be charac- 
terized by the equation 

(2.8) f m = Aly m = J j dE x A* m y m . 

If G(A* m A m ) is defined via then for / e V(G 1 (A* m A m )) and # e V{G 2 {A* m A m )) 



(2.9) (Gi(A* A m )/, G 2 (A^A m )5() = J G X {X)G 2 {X) d(E x f, g) 

and 

(2-10) HG(A*A0/|| 2 = / G(A)rf||E A /|| 2 . 



The source set, A^ ()2.3|) . can be characterized via the singular values as follows: 



3. Regularization methods 

After the general considerations of the last section, we now explain the construction of 
a regularization method for the important special case of selfadjoint linear operators. The 
basic idea for deriving a regularization method is to replace the amplification factors 1/Xj by 
a filtered version Q(\j, a), where the filter function is a piecewise continuous, nonnegative and 
nonincreasing function of A on the segment [0, ||A m || 2 ] for a regularization parameter a > 0. 
The assumptions over the regularizing coefficients Q a (X) are technical and are given in order 
to control fluctuations over set [0, ||74 m || 2 ]. 

The filter family {Q(Xj,a)}j Em approximates the function A -1 for a — > oo. Intuitively, 
a regularization on A m should then be the replacement of the ill conditioned operator A} m 
by a family {R(Xj, a)}j em : Y m — > F m of continuous operators. Throughout all the article, 
we shall denote {Q(Xj,a)}j em and {R(Xj, a)} 3 - 6m by Q a and R a , respectively. Obviously, for 
all a > 0, R a is bounded. 

As the approximation of f m , we then take 

fm,a Q a^A m A rn ^ A m y m RaVmi 

where R a := J Q a (X)dE x A* m . 

Remark 3.1. Note that with the above notation 

(3.1) fm^ = R a y m = R a IlY m Af + RoJ^Y m £ - 

Also that we can write 

Ra Qa{A m A m jA m . 
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The next theorem gives conditions under which the first term in (13. 1|) converges to 
f m = UF m f- The proof follows that of jH], but we include it for the sake of completeness. 

Theorem 3.2. Let, for all a > 0, Q a : [0, ||A m || 2 ] -> JR be a piecewise continuous and 
nonincreasing function of X on the segment [0, ||y4 m || 2 ]. Assume also that there is a C > 
such that 

\XQ a (X)\<C, 

and 

lim Q a (X) = — 
for all A G (0, ||A m || 2 ). Then, for all y G D(A\ n ), 



lim Q a (A* m A m )A* m U^ m Af = f m 

holds with f m = A^ m y m . 

Remark 3.3. In order to assume convergence as a — > oo, it is necessary to choose Q a such 
that it approximates 1/A for all A G (0, ||A m || 2 ]. Also, note that the condition \XQ a (X)\ < C 
implies that \\A m R a \\ = \\A m A^ n Q ,(A^ n A m )\\ < C, i.e, ||^4 m i2 a || is uniformly bounded. 

Proof. As in [9\, if f m is defined by (|2.8p . then by (|2.2|) the residual norm has the representation 
\\fm — Qct{A* m A m )A* m A m f\\ 2 = ||(J — Q a (A* m A m )A* m A m )f m \\ 2 

From the formula (|2.10jh it follows that 

r\\ A ™\\ 2 + 

\\f m -Q a (A* m A m )A* m A m f\\ 2 = / (l-XQ a (X)) 2 d\\E x f m \\ 2 . 

Jo 

Since (1 — A<5a(A)) 2 is bounded by the constant (1 + C) 2 , which is integrable with respect 
to the measure G?||i?A,/m|| 2 , then by the Dominated Convergence Theorem, 

r\\A m \\ 2 + /■|| J 4 m || 2 + 

(3.2) lim / (l-XQ a (X)) 2 d\\Ej m \\ 2 = / lim (1 - XQ a (X)) 2 d\\E\f m \\ 2 . 
a ^°°Jo Jo a ^°° 

Since for A > 0, lim a _ +00 (l — XQ a (X)) = then the integral on the right-hand side of (|3.2j) 
equals to 0. On the other hand, if A = 0, lim Q ,_ >00 (l — XQ a (X)) = 1 then the equation (J3.2|) 
has the form 

r\\Am\\ 2 + 

(3.3) lim / (l-AQ a (A)) 2 d||£; A / m || 2 - lim ||E A / m || 2 - ||E / m || 2 

which is equal the jump of || -Ea /-m, || 2 at A = 0. Since f m G A/'fim) 1 , the term on the right- 
hand side of (|3.3jl equals to 0. Thus, R a A m f converges to f m as a — > oo for y m G D(A$ n ), 
which ends the proof. 

□ 
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Let Tr(B) the trace of the selfadjoint operator B t B for any square matriz B, which is 
defined by 

rr(B) = ij> 

for bj eigenvalues of B l B. 

We then have the following result, 

Theorem 3.4. Let Q a be as in theorem \8. 6 A Let p, p > and let : (0, ao) —>■ R be such 
that for all a G (0, a ) and X G [0, a 2 } 

sup A M |1 - XQ a (X)\ <Lu^a) 

0<A<<rf 

holds. Then for f G A^ iPl the following inequality holds true 

(3.4) E||/ m - / Q , m || 2 < 2^(a) 2 p 2 + 2 ( x 2 Tr(g 2 (^A m )A m ^). 

Proof. The proof of this inequality is based on the definition of the estimator f m>a and on 
the assumptions over this function. We have that the L 2 — norm of the difference between the 
regularized function and the true data function can be bounded by 

(3.5) E||/ m -/ m , a || 2 < 2]E\\f m - R a A m f\\ 2 + 2]E\\R a A m f - f m , a \\ 2 
where = R a y m . 

This is the typical bias-variance decomposition. The first term on the right-hand side is 
an approximation error, which corresponds to the bias, whereas the second term, variance, is 
a stability bound on the regularizing operator R a . Note that by the Theorem 13 .2\ the first 
term in ()3.5|) goes to if y m G VfT^). 

Let uj G F m with ||a;|| < p. Since / G F m then U.F m f = (A* m A m Yuj. On the other hand, 
A M sup A |1 — XQ a (X)\ < ui[j.(a), then the first term in this equation can be bounded by 

2 TCMI f r\ I A* A \ A* A t l|2 



|2 



M\\f m - R a A m f\\ 2 = E||/ m - Q a (A* m A m )A* m A m f, 
= n\(I-Q a (A* m A m )Al l A m )f rt 

^22 
< ■ 

In order to control the term corresponding to the variance we used that the data pertur- 
bation is white noise. Thus, 

n\R a Aj - f m , a \\ 2 = E<e, {Q a {A* m A m )A* m )*Q a {A* m A m )A* m e) 
= TE(e,Ql(A* m A m )A rn A* rn e) 
= a 2 Tr(Q 2 a (A* m A m )A m A* m ) 
which yields the desired result. □ 
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The next result will be useful when studying iterative methods. 

Theorem 3.5. Let Q a be as in theorem VS '.M Assume also that Q a is continuously differen- 
tiable and that the function |1 — AQ Q ,(A)|'|AQ a (A) — doest not decrease. Then the estimates 
are valid 

SUp \Qa{X)\=Q a {0), 
0<A<o-2 

and 

sup A"|l - AQ a (A)| < + l)" 1 ^^) 

0<A<o-f 

where u^(a) = <5 Q (0) _A \ 

Proof. The proof can be carried out by standard techniques. A proof of this result can be 
found in [TT]. □ 



4. Rates of convergence for the regularized estimator 



In any regularization method, the regularization parameter a plays a crucial role. For 
choosing the parameter, there are general methods of parameter selection. For example, 
the Discrepancy Principe [20], Cross- Validation [Z| and the L-curve [10.. They differ in the 
amount of a priori information required as well as in the decision criteria. The appropriate 
choice of regularization parameter is a difficult problem. We would like too choose a, based 
on the data in such a way that optimal rates are maintained. This choice should not depend 
on a priori regularity assumptions. 

Our goal is to introduce adaptive methods in the context of statistical inverse problems. 
In this section we introduce our adaptive estimator, for a fixed m = itlq. We choose mo such 
that ||/ — rii? mo /|| 2 satisfies the optimal rates with high probability since we know 

ii/ - n Fm jw 2 <\\i- u Ymo r = o{d- m y) 

for a certain p and < /i < 1/2. It is satisfied if the dimension of the set is such that 

(4.1) d mo >n^. 
This leads to the rate 

||/-/m,aH = 0(n 4 W +2 P+ 1). 

Analogous results are obtained in the case of Hilbert scales ([T2].[To]). 

Adaptive model selection is a technique which penalizes the regularization parameter, in 
such a way that we choose f mo ,a- by minimizing 
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where 
and 



arg min ( \\R ak (y m - A m f)\\ 2 + pen(a fc ) ) 



k = arg min ( \\R ak (y m - A m f)\\ 2 + pen(a fc ) ) 



pen(a fc ) = ra 2 (l + L k )[Tr(Ri k R ak ) + p 2 (R ak )}, 

with r > 2 and L k is a sequence which is incorporated in order to control the complexity of 
the set K — {1, 2, . . . , k n }, of all possible index up to k n . Here p 2 (B) = p(B t B) is the spectral 
radius of the selfadjoint operator B t B for any square matriz B, which is defined by 



p 2 (B) = — maxfej 

n j£m 



for bj eigenvalues of B t B. 



Thus, k is selected by minimizing 



(4.2) arg mm ( \\R ak {y m ~ A m f)\\ 2 + 



ra 2 (l + L k ) 



n 



^g^A^A^ + maxQ^A^A 



The strategy as proposed in this article automatically provides the optimal order of accu- 
racy. The regularized estimator has a rate of convergence less or equal than the best rate 
achieved by the best estimator for a selected model. We have the following result, 

Theorem 4.1. For any f G F m and any a k the following inequality holds true for d a positive 
constant that depends on r (as in Lemma\4~4\) , 



(4.3) E||/ m - f ai \\ 2 < 7T~~~7J ™nC(l + v)\\L ~ UJ 2 + 2penK)] + ( ' lUl) 



(1 — v) fee/c 



n 



where C\{d) = A a 2 J2 k 



np 2 (R ak ) 



dr Li 



Tr(R% k R ak ) 



+ 1 



dr-Li 



Tr(R t ak R ak ) 



+1 



Remark 4.2. An important issue is that equation (|4.3J) is non asymptotic. The goodness 
of fit of the estimator is defined by trace, Tr(i?^,i? a ) ; and spectral radius, p 2 (R a ). Also, 
the estimator is optimal in the sense that the adaptive estimator achieves the best rate of 
convergence among all the regularized estimators. 



Remark 4.3. Remark that under our assumptions, namely that the basis is orthonormal for 
the fixed design, both np 2 (R k ) and Tr{R t k Rk) / p 2 (R k ) do not depend on n. 
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Proof. For any f ak and any k G IN 

\\Ra k (y m ~ A m f a . k )\\ 2 + pen(a je ) < \\R ak (y m - A m f ak )\\ 2 + pen(a k ) 

and 

\\R ah (y m ~ A m f ak )\\ 2 = \\R ak A m (f - f ak )\\ 2 + 2(R ak A m (f - f ak ),R ak U.y m e) + \\R ak YlY m e\\ 2 
Thus, following standard arguments we have 

\\R a ,A m (f ~ fa k )\\ 2 

< \\R ak A m (f — f ak )\\ 2 - 2 < R ak A m {f-f aj .),R aj .U^ m e > 

+2 < R ak A m {f - f ak ),R ak U Ym e > -\\R a .U^ m e\\ 2 + \\R ak U^ m e\\ 2 + pen(a fc ) + pen(a fe ). 

Let < v < 1. Since the algebraic inequality 2ab < va 2 + M? holds for all a, b G H, we 
find that 

(i-z/)||i? Q .A m (;-/ Q .)|| 2 

< (1 + */)||i4 fc An(7 - /aJU 2 + 2pen(« fc ) + 2sup{i|| J R Qfe n^|| 2 - pen(« fc )}, 

holds for any k and f ak G F m . 

On the other hand, using that is 1 < ||i2 Qfe A|| < C, we have that for any f ak G F mo and 
any /c G IN, 

(l-i/)||/ ra -/«J 2 < C(l + u)\\f m -f ak \\ 2 
+ 2pen(a fc ) + 2C 1 sup {\\R ak Tl^ m e\\ 2 - pen(a fe )}. 

The proof then follows directly from the following technical lemma ([H|, which charac- 
terizes the supremum of an empirical process by the regularization family. 

Lemma 4.4. Let r](A) = \/e t A t Ae = \\Ae\\. Then, there exists a positive constant d that 
depends on r/2 such that the following inequality holds 

(4.4) P(V 2 ( A ) > o- 2 [Tr(A t A) + p{A t A)]r/2{l + L) + a 2 u) 

< exp{-v/d(l/p(A*A)M + r/2L[Tr{A t A)/p(A t A) + 1])}. 

With the above notation, 

r](R ak ) = ||-Rfc£ m || 

where e m = 11^ e. 

Now, with this lemma we have 
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P(sup ||P afc £ m || 2 - pen(a fc ) > o 2 x) 

a k 

< Y. P ^\R« k ) > ra 2 {l + L k )\Tr{R t a R ak )+p 2 {R ak )}+a 2 x] 

k 

< ^expl-y/^l/p^^Jx + rL^Tr^^J/p^^J + 1])} 

k 

Since for X positive EX = J °° P(X > u) du, we then have that 

POO 

IE[sup ||P afc £ m || 2 - pen(a fe )] = / P[sup ||P Qfc e m || 2 - pen(a fc ) > x] dx 

"fc JO "ft 

poo 

= a 2 P[sup || R ak e m || 2 — pen(a fc ) > a 2 u] du 

JO a k 
poo 

< a 2 / exp{ — \/kiU + k 2 } du. 
k Jo 

where h = d/p 2 (R ak ) and k 2 = drL k [Tr(R T ak R ak ) / p 2 (R ak ) + 1]. 
Let w = k\u + k 2 , then 

/"°° 1 - 
E[sup||P Qfc e m || 2 -pen(a fe )] < cr 2 ^ / 7T ex P{~vM 

a k , Jko M 



dw 



k Jk 2 

2 

h 



Finally, we have the desired result. 



CM 



n\fm - f« k || 2 < mf [C(l + u)\\f m - f ak || 2 + 2pen(a fc )] + 

where d(rf) = 4 a 2 ^< ]rL ' 



n 



Tr(Rj k R ak ) 



+ 1 



+ 1 



dr Li 



Tr(Rt ak R ak ) 

p'HRc k ) 



+ 1 



□ 



5. REGULARIZATION BY ITERATIVE METHODS 

Iterative regularization methods, are very competitive methods for linear inverse problems. 
In iterative regularization, one picks an initial guess f for the unknown /, and then one 
iteratively constructs updated approximations via a regularization scheme. The regularization 
parameter associated with iterative regularization is thus the "stopping point" of the iterative 
sequence, and an important part of the mathematical theory is the development of stopping 
criteria for terminating the iteration. In other words, the iteration index plays the role of 
the regularization parameter a, and the stopping criteria plays of the parameter selection 
method. 



14 ITERATIVE METHODS FOR LINEAR INVERSE PROBLEMS 

5.1. Descent Methods for Linear Inverse Problems. 



As an example of iterative regularization, we consider descent methods. Descendent meth- 
ods have become quite popular in the last years for the solution of linear inverse problems 
and for nonlinear inverse problems jX Xg . In this subsection we consider two examples. 

As an approximation of f m we will choose / mjQ such that 

(5-1) fm,a — [I — ^m^rnQa{^Cn^-rn)]fQ + Qci(^^m) A^Uy^ 

where fo G F m is an initial approach and this fo G Af(A m ) ± 

Most iterative methods for approximating / are based on a transformation of the normal 
equation into equivalent fixed point equations like 

f = f + A* m (A m f-y) 

If || A m || 2 < 2 then the corresponding fixed point operator / — A* m A m is nonexpansive 
and one may apply the method of successive approximations. It must be emphasized that 
/ — A* m A m is no contradiction if our inverse problem is ill-posed, since the spectrum of A* m A m 
clusters at the origin. 

5.2. Landweber iteration. 

In this subsection we presented the well-known Landweber iteration, which arises from 
converting the necessary conditions for minimizing (j2.1j) into a fixed point iteration. Much 
development in the last few years has taken place in advancing the theory of Landweber 
iteration for linear and nonlinear inverse problems. 

Using the terminology of the last sections, we introduce the function 

fc-i 

(5.2) Q k {\) = (1 - A) J = - (1 - \) k ) 

3=0 

We call Qk the iteration polynomial of degree k — 1. Associated with it is the polynomial 

r fc (A) = 1 - XQ k (X) = (1 - A) fc 
of degree k, which is called the residual polynomial since it determines the residual y — A m f m ^. 
Thus, inserting the equation ()5.2|) in ()5.1|) we obtain recursively, 

(5-3) fm,k+l = fm,k ~ ^n\-^-mfm,k ~ Vm): fc = 0, 1, . . . 

starting from an initial guess fo. This is a steepest descent method called the linear version 
of Landweber's iteration. Each step of the iterative process ()5.3|) is carried out along the 
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direction opposite to the direction of the gradient of the quadratic functional J(f) in (|2.1jl . 
It is known that there is the greatest decrease of the functional along this direction. 

If ||An|| < 1) we considerer A G (0, 1] such that in this interval AQfc(A) is uniformly bounded 
and since Qfc(A) converge to 1/A as k —>■ oo then according to Theorem 13.21 the sequences 
f m> k converge to f m when y G V(Al n ). If \\A m \\ is not bounded by one, then we introduce a 
relaxation parameter < r < ||A m ||~ 2 in front of A* m in (j5.3j) . i.e, we would iterate 

(5.4) fm,k+l = fm,k - TA* m (A m f m:k —y),k = 0,l,... 

If r = Tk, one can obtain various variants of the method of steepest descent depending on a 
choice of the sequence Tfe. The Landweber iteration (jf>.4j) is usually called a method of simple 
iteration. 

In the following we derive a simple estimate for the error propagation in the Landweber 
iteration. We then have the following result, 

Corollary 5.1. Let t = l/(2||A m || 2 ) < 1/Ai. If y G lZ(A m ), then the Landweber iteration is 
an order optimal regularization method, i. e, 

v 2 



m - h {m) \\ 2 < 2 Cl k^ + 2c 2 -(rA;p+ 1 )^ 



a 

n 

where a = p\±T and c 2 = ^-(|±i)(^+i)/4p. 



Proof. To apply Theorem 13.41 we have to study the terms of the bias, E||/ m — R a A m f\\ 2 , and 
variance JE\\R a A m f — fk( m )\\- By ()5.2|) we have 



fm ~ RaA m f — (I — A m A m Qk(A m A rn )) f m — (J — A m A rn ) f m 



We have to study the residual polynomial r&(A) = (1 — A) fc of the Landweber iteration. 
For < A < ||v4 m || 2 the function 

A"|l-AQ fc (A)| 
assumes its maximum for A = r _1 yu(/i + A;) -1 . 
Thus, we have 

A"|l — AQ A (A)| < max A M 1 1 — AQfc(A) | 

0^A< || Am || 

u M k k 

< 



t^(/j, + k)v (fi + k) k 



< ( — I k->' 
re 



This leads to numbers uj^k) as introduced in Theorem |3] 
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Thus, the term corresponding to the bias is bounded by 

\\j m - Ra Ajf < P \^y»k- 2 ». 

re 

Next, we establish bounds for the variance term. By assumption, the singular values satisfy 
\j £t j~ 2p . Note that for small values of Xj we have 

Q|(A) < {rkf Vj>m' 

and for big values of Xj (Xj ~ Ai) 

Ql(\ 3 )<\f Vj<m'. 

Consequently, 

m ml 

nTr(Ql(A m A*JA m Al) = ^Q^A, < J^AJ 1 + ^(rkfXj 

j=l j=l j>m' 

rm' pwI 

< \ s 2p ds + (rk) 2 / s~ 2p ds 
Jo Jo 

This suggest searching w! ~ c(r£;) 1 / 2p for p > 1/2, where c = (§^0 1 ^ 4p - Hence we have, 
n\R«Aj - f k{m) \\ 2 = Tr(Q 2 k (A m A*JA m A*J 

c 2p+l / r £\(2p+l)/2p 

~ 2p + 1 n 

Finally, this implies 

r 2 

n 

where c x = p 2 (±Y and c 2 = ^(|±I)(W)/4p 

□ 

Remark 5.2. Note that under the above inequality is satisfied if the dimension of the set is 

i 

such that d mo ~ n 4 ^p+ 2 p +1 . Here, the optimal choice of regularization sequence, depending on 

p and \i. The optimal rates are of order IE ||/ — /k( m )| | = 0(n «w+2p+ij. Analogous results are 
obtained in the ill-posed problem literature, see for example [5], where typically in a Hilbert 

2s 

scale setting optimal rates are of order 0{n 2s + 2 p+ 1 ), with s = 2pp. 

We are ready to state our main result for the Landweber iteration, which bounds the mean 
squared error of the select estimate f~ k basically by the smallest mean squared error among 
the estimates fk plus a remainder term of order 1/n. The result follows from Theorem 14.11 



E||/ m - fk [m) \\ 2 < 2 Cl k- 2 » + 2c 2 -(rkf p+ ^ 2p , 
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Corollary 5.3. Let r = l/(2||A m || 2 ) < 1/Ai. Next assume k as in ()4.2|) and d mo as in f!4.1|) . 
If ' y G lZ(A m ) then for any f G F m and any k, the following inequality holds true 



m\fm-m 2 <- 



1 



, inf 

(1 - V) keK 

rk 
1 



Aa 2 sr^ i~k 

+— E 



n 



C(l + v)\\f m -f k \\ 2 + 



drL k [c(r A;) 1 /* + 1] + 1 



2p+l 

, , 2ro 2 {l + L k ){c{rk)^r + rk) 



n 



-y/ dr L fc [c(rfc) 1 /2p + i] 



for some C > and c = ^(^) {2p+1)/4p ■ 

Proof. For fixed Xj and m we have that the terms of the trace and spectral radius are bounded 
by the follows expression 



(5.5) 
and 



j'=i 



3=1 



j>m' 



(5.6) 



p 2 (R k R k ) = max Q^(A,)Aj < max A - 1 + max(r£;) 2 Aj. 

j£m j<rn' j>m' 



Balancing both terms in ()5.5|) and ()5.6|) gives the optimal choice of the trace and the spectral 
radius, respectively. Thus, we have 



Tr{R{R k 



CjP+l) . 2p+l 

1 [2p+l\ 4 p (rk) 2 p 



and 



2p + 1 \2p- 1 

rA; 



n 



Note that the penalization term is roughly proportional to 



(2p+l) 

1 ( 2p + 1 \ 4 P 2p+l 

1 (rk) 2 p + rk 



2p+l\2p-l 



On the other hand 



p 2 (R a ) 2p + l\2p-l 
The result then follows directly from Theorem 14.11 



0„ _|_ 1 \ (2P+!)/ 4 P 

" ^ (r/,-)' 2 >\ 



□ 
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5.3. Nonlinear multistep iterative process. 

Many approximate methods widely used in practice are nonlinear. We cite a important 
example of nonlinear approximate method. We considerer a nonlinear multistep iterative 
process, which have error residual 

k 

l-AQ fc (A) = n(l"^ 1 A) 

i=l 

with r ik = T ik (f ,A,y) > 0, < r lk < r 2k . . . < r kk < A x . Then for A > 0, Q k (X) have the 
following representation 

k 

Q k (\) = \- 1 [l-l[(l-T^\)] 

i=l 

The following corollary is established. 

Corollary 5.4. Let r ik = T ik (f ,A,y) > 0, with < T Xk < r 2k . . . < r kk < A x . If y e K(A m ), 
then the nonlinear multistep iterative process is an order optimal regularization method, i.e, 

n\L~ Mm) f < 2 ci (J] + 2 c 2 £ (£ t£)<*»V* 

1=1 1=1 

where a = p 2 ^(li + l)" 1 and c 2 = 2^l(| ± f) (2p+1)/4p . 

Proof. As before, we investigate the behavior of the bias and the variance. 
In the relation 

k 

\»(l-\Q k (\)) = \»H(l-Tr k 1 \) 

8=1 

the least upper can not be reached at the points A = and A = r lfc , since the estimated 
function is not equal to zero identically. 

On the other hand 



k 

(5.7) [1 - AQ fc (A)]'[AQ fc (A) - l]" 1 = V — 



1=1 



Tik - A 



Since the function in the right-hand of ()5.7j) does not decrease as a function of A on the 
half-interval [0, Tik) then, the estimates of the Theorem 13.51 are valid. 

Thus, for < A < Tife, we have 



k 

sup \Q h (\)\ = QM = Y,^ 1 
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and 



sup A11-Ag fc (A)|<^( /U +l)- 1 (^r i ; 1 ; 

0<A<r lfe ~ 



Note that uo^k) = (Yli=i T i k l ) M - Thus, the bias is bounded by 



\\fm-Rc l Aj\\ 2 <c l {Y,rr l : 



-2fi 



i=l 



where c\ = p 2 fi 2fl (fi + 1) 



On the order hand, it is not difficult to see that 

k 



Tr(Ql(A m A^ n )A m A^ n ) < 



n 



-2p 



l<i<m' 



i=l j>m' 



< 



n 



m 



/2p+l 



2p + l 



,m 



/-2p+l" 



i=l 



2p-l 



This suggest searching 



/2p 



i=l 



Withc=(|±l) 1 /4P. 

Thus, we have that the term variance is bounded by 

n\R a Aj- f k{m) \\ = a 2 Tr(Ql(A* m A m )A* m A m ) < c 2 ^-(^ r^ 1 )®* 1 ^ 

where c 2 = ^(g^) (2p+1)/4p 
Finally we have 

E||/ m - / fcH II 2 < 2 Cl ( £ r?)-*> + 2 C2 £ (£ t?)<*+w 

1=1 1=1 
Balancing the bias and variance terms gives the optimal choice 



mfm-fk(m)\\ 2 = 0(n^+^+\ 



□ 



We have the following result. 



Corollary 5.5. Let be as in corollary \5.J\ . Next assume k as in ()4.2j) and d mo as in (|4.1j) . 
If y G 7£(A m ) i/ien /or any / e F m and any fc ; iae following inequality holds true 



20 



ITERATIVE METHODS FOR LINEAR INVERSE PROBLEMS 



C(l + z/)||/ m -/ fc || 2 + 



n 



4q 2 ^ Eti7-i fc 1 
n ^ d 

k 



\ 



drL k 



£0 1/2p +! 



i=i 



+ 1 



for some C > and c = _J_(Ie±1]C2p+i)/4 P 



2p+lV2p-l< 



Proof. First observe that 



Tr{R l k R k 



(2p+l) , , 

1 (2 P +l\ — (Eti^ J 2p 



2p+l 



and 



2p + 1 V 2 P- 1. 



P ( R k R k 



n 



Consequently 



Tr(RjR k ) _ 
p(Rk) ~ 2p 



(2p+l) fc 



+ l\2p-l J l 2-, ■» > 

x 7 2=1 



Note that both np 2 (R k ) and Tr(R k R k )/ p 2 (R k ) do not depend on n. The proof then follows 
directly from theorem 14.11 

□ 
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